Programming in CUDA for Kepler and Maxwell Architecture

نویسندگان

  • Esteban Walter Gonzalez Clua
  • Marcelo Panaro de Moraes Zamith
چکیده

Since the first version of CUDA was launch, many improvements were made in GPU computing. Every new CUDA version included important novel features, turning this architecture more and more closely related to a typical parallel High Performance Language. This tutorial will present the GPU architecture and CUDA principles, trying to conceptualize novel features included by NVIDIA, such as dynamics parallelism, unified memory and concurrent kernels. This text also includes some optimization remarks for CUDA programs. 1Instituto de Computação, Universidade Federal Fluminense, Niterói RJ, Brazil {[email protected]} 2Depto. de Ciência da Computação, Universidade Federal Rural do Rio de Janeiro, Nova Iguaçu RJ, Brazil {[email protected]} Programming in CUDA for Kepler and Maxwell Architecture

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advances in GPU Research and Practice

Tesla NVIDIA computing accelerators are currently based on Kepler and Maxwell architectures. The recent versions of Compute Unified Device Architecture (CUDA), such as CUDA 7.0, coupled with the Kepler and Maxwell architectures facilitate the dynamic use of GPUs. Moreover, data transfers can now happen via high-speed network directly from any GPU memory to any other GPU memory in any other clus...

متن کامل

RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services...

متن کامل

Code Optimization on Kepler GPUs and Xeon Phi

Kepler GTX Titan Black and Kepler Tesla K40 are still the best GPUs for high performance computing, although Maxwell GPUs such as GTX 980 are available in the market. Hence, we measure the performance of our lattice QCD codes using the Kepler GPUs. We also upgrade our code to use the latest CPS (Columbia Physics System) library along with the most recent QUDA (QCD CUDA) library for lattice QCD....

متن کامل

Multi-Gbps Fano Decoding Algorithm on GPGPU

The bandwidth requirements for the nextgeneration wireless applications are increasing. The newest standards such as the WirelessHD aim to transmit signals at high speed in the range of multi-Gigabit per second (Gbps). At this rate, the processing effort of the baseband signals becomes challenging. In this paper, we propose to use GPGPU for parallel processing to offer multi-Gbps throughput for...

متن کامل

Multi-GPU Implementations of Parallel 3D Sweeping Algorithms with Application to Geological Folding

This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine to carry out plane-by-plane updates in parallel 3D sweeping algorithms. In particular, care must be taken to mask the overhead of various data movements between the GPUs. Multiple OpenMP threads on the CPU side should be combined multiple CUDA streams per GPU to hide the data transfer cost relate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • RITA

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2015